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Summary 


Different  methods  of  eliciting  responses  to  the  same  question  often 
produce  different  responses.  In  order  to  systematically  study  how  response 
scales  affect  likelihood  ratio  judgments,  two  experiments  were  conducted. 
Experiment  I manipulated  two  independent  variables:  the  endpoints  of  the 
response  scales  (100:1,  1000:1,  10,000:1)  and  the  spacing  of  the  scales 
(logarithmic  versus  linear) . Results  compared  the  veridicality  of  responses 
on  the  six  scales  produced  by  crossing  these  factors  plus  another  response 
mode  in  which  subjects  simply  wrote  their  judgment  in  a blank  (no  scale) . 

Logarithmic  scales  produced  responses  that  were  both  more  veridical  and 
more  consistent  than  responses  on  linear  scales  which  were,  in  turn,  better 
than  simple  written  responses.  Measures  of  the  effect  of  the  endpoints  were 
somewhat  inconsistent  and  probably  interacted  with  the  range  of  veridical 
likelihood  ratios.  Judgments  of  relatively  small  likelihood  ratios  were 
affected  by  the  spacing:  linear  spacing  caused  overestimation.  Judgments  of 
relatively  large  likelihood  ratios  were  controlled  more  by  the  endpoints: 
higher  endpoints  produced  larger  judgments.  Apparently,  subjects  use  the 
range  of  the  scale  as  information  about  the  range  of  true  likelihood  ratios. 

Experiment  II  manipulated  two  additional  variables,  data  diagnosticity 
and  the  values  of  the  true  likelihood  ratios.  The  results  of  Experiment  I 
were  confirmed  while  neither  of  the  additional  variables  radically  changed 
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Judgments  change  in  response  to  the  information  provided  in  the  sur- 
rounding environment,  regardless  of  whether  or  not  the  information  is 
relevant  to  the  judgment.  Changing  judgments  in  response  to  irrelevant 
information  will  usually  lead  to  inconsistencies  among  judgments.  Such 
inconsistencies  pose  a particular  problem  when  the  judgments  serve  as  the 
basis  for  making  decisions,  as  in  decision  analysis.  Subjective  judgments 
of  both  probability  and  utility  are  required  for  decision  analysis- -judgments 
which  are  known  to  be  inconsistent  in  certain  situations.  For  example, 
different  methods  of  eliciting  subjective  probability  distributions  will 
produce  different  distributions  (Seaver,  von  Winterfeldt,  5 Edwards,  1975; 

Stael  von  Holstein,  1971;  Winkler,  1967).  The  questions  asked  to  determine 
subjective  probability  distributions  include  no  information  that  should  cause 
the  subjective  distributions  to  change,  yet  consistent  differences  do  occur. 

If  different  elicitation  methods  lead  to  differences  in  the  assessed 
probabilities,  these  differences  need  to  be  eliminated  or  taken  into  account. 

One  way  to  approach  this  problem  is  to  learn  what  causes  these  inconsisten- 
cies. For  example,  the  type  of  response  required  affects  the  judgments. 

Responses  to  the  same  questions  in  odds  and  probabilities  will  typically  not 
be  equivalent  as  has  been  shown  both  in  probabilistic  inference  tasks 
(Fujii,  1967;  Phillips  6 Edwards,  1966)  and  in  the  assessment  of  subjective 
probability  distributions  (Seaver  et  al.,  1975).  In  fact,  even  if  the  same 
type  of  response  is  required,  the  way  in  which  it  is  recorded  seems  to 
systematically  change  the  responses.  Posterior  odds  judgments  in  probabilis- 
tic inference  tasks  have  usually  been  larger  when  recorded  on  a logarithmic 
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scale  than  when  simply  written  (Fujii,  1967;  Phillips  5 Edwards,  1966).  A 
similar  difference  has  been  shown  between  likelihood  ratio  judgments 
recorded  on  logarithmic  scales  and  those  that  were  written  (Domas,  Goodman, 

$ Peterson,  1972) . Goodman  (1973)  in  reanalyzing  the  data  from  several 
experiments  (including  Domas  et  al.)  to  determine  the  effects  of  several 
variables  on  judgments  of  uncertainty,  concluded  that  judgments  recorded  on 
logarithmic  scales  were  generally  larger  regardless  of  accuracy.  In  some 
instances  the  larger  responses  were  more  veridical,  while  other  times  they 
were  less  veridical. 

An  unpublished  pilot  study  by  Seaver  and  von  Winter feldt  conducted  prior 
to  the  Seaver  et  al.  experiment  also  suggested  that  another  response  scale 
variable- -the  upper  endpoint- -may  affect  odds  or  likelihood  ratio  judgments. 
Although  the  scale  endpoints  were  not  systematically  manipulated,  subjects' 
responses  were  apparently  influenced  by  the  endpoints.  When  subjects  were 
very  certain,  they  tended  to  respond  with  the  scale  endpoint  regardless  of 
its  value,  even  though  they  had  been  instructed  to  respond  off  the  scale 
if  necessary. 

The  current  experiments  were  undertaken  to  systematically  explore  how 
variations  in  the  response  scale  affect  likelihood  ratio  judgments.  In 
particular,  we  were  interested  in  the  differences  between  responses  on 
logarithmic  scales,  linear  scales,  and  no  scales;  and  in  how  the  upper 
endpoints  of  the  scales  affect  the  responses.  Knowledge  of  such  differences 
should  be  of  practical  use  to  those  who  seek  accurate  quantification  of 
uncertainty. 
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II.  Experiment  I 
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II.l.  Method 

1 1. 1.1.  Subjects.  The  .subjects  were  74  undergraduate  students  at  the 
University  of  Southern  California  enrolled  in  an  introductory  psychology 
course.  Participation  in  several  experiments  throughout  the  semester  was 
required  for  credit  in  the  course. 

1 1 . 1 . 2 . Apparatus . Stimuli  for  the  experiment  were  seven  inch  (17.78 
cm)  sticks  with  one  end  painted  red  and  the  remainder  of  the  stick  painted 
white.  Each  stick  represented  a sample  from  one  of  two  populations  of 
sticks,  each  normally  distributed  with  mean  red  lengths  of  five  inches 
(12.7  cm)  and  two  inches  (5.08  cm)  respectively  and  a common  standard  devi- 
ation of  one  inch  (2.54  cm).  The  lengths  of  red  and  white  were  varied  to 
produce  true  likelihood  ratios  from  2:1  to  12,000:1.  Each  of  twenty- five 
different  normal  deviates  were  used  to  produce  two  sticks,  one  with  more 
red  than  white  and  one  with  more  white  than  red. 

The  population  characteristics  of  the  sticks  were  displayed  to  the  sub- 
jects by  two  histograms,  each  a representative  sample  of  one  hundred  sticks 
from  one  of  the  populations.  These  sticks  were  selected  from  a normal  dis- 
tribution and  were  spaced  equidistant  on  the  distribution  function  from 
minus  to  plus  three  standard  deviations.  The  sticks  from  each  population  were 
randomly  arranged  to  form  the  respective  histograms.  The  displays  were  the 
actual  size  and  color  of  the  original  stick  populations  with  the  population 
mean  displayed  by  a heavy  yellow  horizontal  line.  These  displays  were 
visible  to  the  subjects  throughout  the  experiment. 

Seven  different  response  scales  were  used:  three  with  logarithmically 
spaced  markings  and  upper  endpoints  of  100:1,  1000:1,  and  10,000:1;  three  with 
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linearly  spaced  markings  and  the  same  endpoints;  and  one  with  simply  a blank 
to  fill  in.  Henceforth  these  scales  will  be  referred  to  as  loglOO, 
loglOOO,  loglOOOO,  linlOO,  linlOOO,  linlOOOO,  and  open.  Each  individual 
recorded  responses,  one  to  a page,  in  a booklet  containing  only  a single 
type  of  response  scale. 

I I . 1 . 5 . Procedure . Subjects  received  written  instructions  explaining 
the  nature  of  the  task  and  the  experimental  stimuli.  The  display  histogram 
were  described  as  random  samples  from  the  two  populations.  The  written 
instructions  further  directed  subjects  that  certainty  was  to  be  expressed  in 
likelihood  ratios  and  explained  the  concept  of  likelihood  ratios. 

Following  the  review  of  the  written  instructions,  a short  example  of 
the  two -hypothesis  likelihood  ratio  estimation  procedure  was  explained 
verbally.  Both  written  and  verbal  instructions  emphasized  that  when  sub- 
jects' likelihood  ratio  estimates  were  greater  than  those  provided  on  the 
scale,  they  were  to  make  a mark  at  the  top  of  the  scale  and  simply  write 
their  numerical  judgment. 

Subjects  then  viewed  the  50  sticks,  one  at  a time,  and  responded  with 
likelihood  ratio  judgments  on  the  appropriate  scales.  The  subjects  were 
allowed  to  pick  up  the  sticks  or  move  them  to  get  a better  perspective,  but 
were  not  allowed  to  compare  them  with  previous  sticks.  For  each  stick  the 
subjects  chose  which  population  was  more  likely  to  have  produced  the  stick 
and  indicated  a likelihood  ratio  corresponding  to  their  certainty. 

The  sticks  were  presented  in  four  different  randomized  orders.  Subjects 
were  run  in  self-selected  groups  of  from  three  to  seven  persons  based  on 
the  time  for  which  they  registered  on  a sign-up  sheet.  Different  response 
scales  were  assigned  randomly  to  groups.  The  number  of  subjects  using  each 
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(of  the  response  scales  was  11,  14,  9,  10,  9,  9,  and  12  for  the  linlOO, 

linlOOO,  linlOOOO,  loglOO,  loglOOO,  loglOOOO  and  open  scales  respectively. 

^ Unequal  numbers  resulted  from  the  failure  of  some  subjects  to  follow  direc- 

tions properly  in  making  their  responses. 
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II. 2.  Results 

The  data  were  subjected  to  a logarithmic  transformation  and  all  analyses 
were  performed  on  the  transformed  data.  The  likelihood  ratio  responses  were 
regressed  on  the  true  likelihood  ratios  for  each  individual  subject.  Table  1 
shows  the  individual  correlations  from  these  analyses  and  the  mean  correlations 
for  each  response  scale  calculated  using  the  Fisher-z  transformation.  The 
relatively  large  number  of  subjects  with  nonsignificant  correlations  (p>.05) 
suggests  considerable  unreliability  in  some  subjects'  responses.  This 
unreliability  is  more  pronounced  in  subjects  responding  on  linear  scales 
(9  out  of  34  subjects)  than  in  subjects  responding  on  logarithmic  scales 
(1  out  of  28  subjects).  With  the  unreliability  due  to  subjects  with  non- 
significant correlations  removed,  little,  if  any,  difference  exists  among 
mean  correlations. 

Table  2 shows  the  slopes  and  intercepts  of  the  individual  regression 
analyses.  The  mean  slopes  and  intercepts  for  each  response  scale  were 
calculated  excluding  the  subjects  with  nonsignificant  correlations.  A per- 
fect correspondence  between  responses  and  true  likelihood  ratios  would 
result  in  a slope  of  1.0  and  an  intercept  of  0.0.  The  most  striking  result 
is  the  difference  in  intercepts  between  linear  and  logarithmic  scales. 
Intercepts  on  the  logarithmic  scales  are  consistently  lower  (closer  to  0.0) 
than  intercepts  on  the  linear  scales.  The  slopes  also  tend  to  increase  as 
the  endpoint  of  the  scales  increased  with  the  exception  of  the  linlOOO  response 
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Correlations  Between  True  Likelihood  Ratios  and 
Responses  for  Individual  Subjects 
(Experiment  I) 
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Note:  Non- significant  correlations  are  in  parentheses.  N in  parentheses  is  the 

number  of  subjects  in  the  given  response  mode  with  non-signi ficant  correlations 
Mean  correlations  in  parentheses  are  calculated  for  response  mode  groups  with 
non-significant  correlations  removed. 


Note:  Parentheses  indicate  correlation  for  subject  was  non-significant.  N in  parentheses 

is  the  number  of  subjects  in  the  given  response  mode  with  non-signi ficant  correlations. 
Mean  slopes  and  intercepts  in  parentheses  are  calculated  for  response  mode  groups 
with  individuals  with  non-significant  correlations  removed. 
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scale. 

To  provide  numbers  that  represent  each  response  scale  without  being 
influenced  by  the  unreliability  of  the  data,  median  responses  were  confuted 
across  subjects  for  each  response  scale  at  each  of  the  25  true  likelihood 
ratios.  Subjects  with  nonsignificant  correlations  were  removed  from  this 
computation.  The  individual  judgments  used  to  calculate  these  medians  were 
the  arithmetic  means  of  the  responses  to  the  two  sticks  with  the  same  true 
likelihood  ratio,  but  favoring  different  populations.  Scatterplots  of  these 
medians  and  the  regression  lines  and  statistics  are  shown  in  Figure  1. 

The  dependence  of  subjects'  likelihood  ratio  judgments  on  response 
scales  is  evidenced  in  several  ways.  Providing  any  scale  for  responses 
seems  to  increase  the  reliability  of  subjects'  judgments  as  shown  by  the  lower 
correlation  for  the  open  scale  compared  with  correlations  for  five  of  the 
other  six  response  scales:  only  linlOO  has  a lower  correlation.  In 
addition,  all  the  correlations  for  logarithmic  scales  are  noticeably  higher 
than  any  of  the  correlations  for  the  linear  scales  indicating  that  logarith- 
mic spacing  increases  reliability.  The  slopes  of  the  logarithmic  scales 
are  also  generally  higher  (closer  to  1.0)  than  the  linear  or  open  scales 
and  the  intercepts  indicate  that  the  logarithmic  scales  are  superior  to  the 
linear  or  open  scales.  Thus,  all  three  statistics  favor  the  logarithmic 
scales  over  the  linear  and  open  scales. 

The  overall  effects  of  the  endpoints  are  less  clear.  The  slopes  obtained 
in  this  analysis  confirm  the  tendency  found  in  the  individual  data  for  the 
slopes  to  increase  as  the  endpoints  increase.  No  systematic  effects  on  the 
correlations  or  intercepts  are  apparent.  Not  surprisingly,  the  scatterplots 
show  that  the  endpoints  clearly  function  as  an  upper  bound  for  responses. 
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Scatterplots  and  Regression  Lines  of  Log  Responses  Versus 
Log  True  Likelihood  Ratios 
(Experiment  I) 
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Scatterplots  and  Regression  Lines  of  Log  Responses  Versus 
Log  True  Likelihood  Ratios 
(Experiment  I) 
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Each  scale  with  100:0  as  the  endpoint  has  a maximum  median  response  less 
than  100:1  (2.0  on  the  logarithmic  scale).  A similar  effect  is  also 
apparent  for  the  other  endpoints.  In  this  respect  the  open  scale  seems  most 
similar  to  the  scales  with  100:1  endpoints. 

Because  the  large  number  of  true  likelihood  ratios  greater  than  100:1 
may  have  unduly  influenced  these  findings,  similar  regression  analyses  were 
performed  on  only  the  median  responses  to  true  likelihood  ratios  less  than 
100:1  (12  values).  Although  the  differences  are  less  dramatic,  the  same 
general  effects  were  found  in  this  restricted  range.  The  only  striking 
difference  was  in  the  linlOOOO  scales  where  the  slope  increased  to  about  1.5 
and  the  intercept  decreased  to  -0.1. 

Two  final  analyses,  consisting  of  six  planned  comparisons  each,  were 
performed  to  determine  the  effects  of  the  response  scales  on  the  correspon- 
dence between  individual  subjects'  responses  and  true  likelihood  ratios 
(Hays,  1973,  chapter  14).  The  measures  used  were  the  absolute  values  of 
the  differences  between  the  logarithm  of  the  response  and  the  logarithm  of 
the  true  likelihood  ratio.  The  six  comparisons  were  log  versus  linear,  log 
versus  open,  linear  versus  open,  100:1  endpoints  versus  1000:1  endpoints, 
100:1  endpoints  versus  10000:1  endpoints,  and  1000:1  endpoints  versus 
10000:1  endpoints.  These  comparisons  were  made  both  on  data  from  all 
subjects  and  on  data  from  only  those  subjects  with  significant  correlations 
(see  Table  1) . The  measures  of  correspondence  used  in  these  comparisons 
were  the  absolute  value  of  the  difference  between  the  logarithm  of  the 
response  and  the  logarithm  of  the  true  likelihood  ratio. 

The  means  of  this  measure  for  each  response  scale  and  the  marginal 
means  used  in  the  planned  comparisons  are  presented  in  Table  3.  Significant 
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Table  3 


Mean  Absolute  Deviations  Between  Log  Responses 
and  Log  True  Likelihood  Ratios 


Endpoints 


100:1 


1000:1 


10000:1 


Marginal 

Means 


Linear 

Spacing 

.8296 

(.7694) 

.8421 

(.8185) 

1.2083 

(.9909) 

.9350 

(.8483) 

Logarithmic 

.8547 

(.8547) 

.6670 

(.6670) 

1.1584 

(1.1613) 

.8920 

(.8929) 

Marginal 

Means 

.8416 

(.8100) 

.7736 

(.7592) 

1.1834 

(1.0761) 

Open 

1.1823 

(1.1579) 

Note:  Numbers  in  parentheses  exclude  subjects  with 
nonsignificant  correlations. 


differences  (p<.01)  yielded  the  following  orders  (from  best  to  worst)  for 
data  from  all  subjects. 


FT 

1 1 

s 


t 


1000:1  endpoints  ■+■  100:1  endpoints  ■*  10000:1  endpoints 
logarithmic  -*■  linear  -*-  open 

Comparisons  using  data  from  only  those  subjects  with  significant  correlations 
resulted  in  the  following  orderings.  (All  differences  were  significant  at 
the  .01  level  except  the  1000:1  endpoints  versus  100:1  endpoints  which  was 
significant  at  the  .02  level.) 

1000:1  endpoints  -*■  100:1  endpoints  -*■  10000:1  endpoints 
logarithmic  + linear  -*  open 

II. 5.  Discussion 

This  study  indicates  the  existence  of  consistent  biases  in  subjects' 
likelihood  ratio  judgments  that  are  dependent  upon  the  scale  on  which  the 
judgments  are  recorded.  Apparently  information  from  the  response  scales 
that  should  be  irrelevant  is  not  treated  as  such  by  the  subjects  when  making 
their  responses. 

The  two  factors  manipulated  in  this  study  affect  different  ranges  of 
likelihood  ratio  judgments.  The  spacing  of  the  scales  seems  to  control 
responses  to  relatively  small  likelihood  ratios,  while  the  scale  endpoints 
exert  more  control  over  larger  likelihood  ratio  judgments. 

The  logarithmic  scales  facilitated  responses  at  the  lower  end  of  the 
scales  leading  to  consistently  more  veridical  responses  than  the  linear 
scales.  Subjects  may  have  had  more  difficulty  responding  with  small  likeli- 
hood ratios  on  the  linear  scales  because  the  small  likelihood  ratios  were 
physically  close  together  relative  to  the  same  likelihood  ratios  on  the 
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logarithmic  scales.  For  example,  the  distance  between  1:1  and  10:1  on  the 
loglOOOO  scale  used  in  this  study  was  approximately  7.85  cm,  but  was  only 
about  .031  cm  on  the  linlOOOO  scale.  The  physically  small  region  available 
for  low  responses  on  the  linear  scales  may  well  have  led  subjects  to  avoid 
responses  in  that  region.  The  relatively  high  intercepts  for  responses  on 
linear  scales  support  this  conjecture. 

The  obvious  effect  of  the  scale  endpoints  is  that  they  serve  as  a 
ceiling  for  responses.  The  slopes  of  median  responses  also  generally  increased 
as  the  endpoints  increased.  Thus,  the  results  of  the  analysis  of  differences 
between  responses  and  true  likelihood  ratios  showing  responses  on  scales 
with  1000:1  endpoints  to  be  more  veridical  are  somewhat  surprising.  The 
conflict  between  these  results  is  probably  primarily  due  to  the  difference 
between  the  use  of  medians  and  means.  The  lack  of  a rationale  for  choosing 
between  these  statistics  suggests  that  conclusions  concerning  the  effect  of 
scale  endpoints  on  the  veridicality  of  judgments  should  not  be  drawn  without 
more  research. 

Use  of  the  open  scale  seems  inadvisable.  The  correlations  between 
median  responses  and  true  likelihood  ratios  indicated  the  open  scale  may 
produce  judgments  less  closely  tied  to  the  true  likelihood  ratios,  while 
analysis  of  the  differences  between  responses  and  true  likelihood  ratios 
showed  the  responses  were  less  veridical  on  open  scales  than  on  either 
logarithmic  or  linear  scales.  This  is  not  surprising  since  any  type  of 
judgment  would  be  expected  to  be  more  consistent  when  responses  are  made  on 
physical  scales  rather  than  simply  written. 

The  findings  of  Experiment  I are  consistent  with  the  results  reported 
by  Domas  ct  al.  (1972)  in  that  the  slopes  of  the  regression  lines  comparing 
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responses  with  true  likelihood  ratios  are  larger  for  logarithmically  spaced 
scales.  However,  the  slopes  are  less  than  1.0  rather  than  greater  as  found 
by  Domas  et  al.  This  difference  can  be  explained  by  the  relatively  large 
true  likelihood  ratios  used  in  this  study.  Larger  likelihood  ratios  typi- 
cally result  in  a decrease  in  the  slope  of  such  regression  lines.  Certain 
other  differences  are  also  apparent.  While  Domas  et  al.  attribute  the 
larger  slopes  with  logarithmic  scales  to  a tendency  to  make  larger  judgments, 
in  this  study  the  larger  slopes  are  probably  at  least  partially  due  to  the 
increased  use  of  small  odds,  and,  therefore,  intercepts  closer  to  0.0. 

Domas  et  al.  do  not  report  the  intercepts  of  their  data  for  a similar  com- 
parison to  be  made.  In  this  study  any  tendency  to  make  larger  judgments 
seems  more  likely  to  be  the  result  of  higher  endpoints  rather  than  logarith- 
mic scales. 

Several  conclusions  tentatively  can  be  drawn  from  this  study:  (1)  any 
scale  is  better  than  no  scale;  (2)  logarithmic  scales  are  better  than  linear 
scales;  (3)  the  absolute  mangitude  of  responses  depends  heavily  on  the 
endpoint  of  the  response  scale.  If  these  conclusions  remain  valid,  they 
have  considerable  practical  implications  for  the  elicitation  of  subjective 
likelihood  ratios.  However,  because  of  the  apparent  dependency  of  effects 
of  the  values  of  likelihood  ratios,  the  true  likelihood  ratios  of  the 
stimuli  used  in  this  study  may  have  been  a critical  factor  in  determining 
the  overall  effects.  The  stimuli  used  had  a large  d'  and  a wide  range  of 
likelihood  ratios  with  relative  emphasis  on  large  likelihood  ratios.  Thus, 
they  are  quite  dissimilar  to  stimuli  used  in  other  laboratory  experiments 
which  typically  have  lower  values  of  d'  (usually  2.2  or  less)  and  true 
likelihood  ratios  more  concentrated  in  a lower  range.  In  order  to  explore 
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how  d'  and  the  range  of  true  likelihood  ratios  affect  these  results,  a second 

study  was  undertaken.  I 

III.  Experiment  II 

Experiment  II  examined  two  factors  which  would  extend  knowledge  of  the  nature 
of  the  response  mode  phenomenon.  A less  extreme  level  of  data  diagnosticity 
represented  by  a d'  of  1.5,  was  used  along  with  the  original  level  of  3.0.  Also, 
the  method  of  selection  of  true  likelihood  ratios  was  varied:  both  the  method 
used  in  Experiment  I resulting  in  true  likelihood  ratios  of  2:1  to  12,000:1  and 
a more  typical  method  of  generation  by  a normal  random  process  were  used.  Both 
of  these  factors  had  led  to  the  selection  of  generally  large  likelihood  ratios  in 
Experiment  I which  may  have  biased  the  results  in  favor  of  logarithmic  scales 
with  large  endpoints. 

III.l.  Method 

1 1 1. 1.1.  Subjects.  One  hundred  and  ninety- two  undergraduates  at  the  University 
of  Southern  California  served  as  subjects  for  this  experiment  as  a requirement  for 
an  introductory  psychology  class.  Subjects  were  each  paid  $3.00  for  participation 
in  the  experiment. 

111. 1.2.  Procedure.  The  normal  process  underlying  the  generation  of  data  was 
the  same  as  used  in  Experiment  I,  but  the  stimuli  were  changed.  Subjects  were 

told  that  samples  were  taken  from  a series  of  lakes  and  that  the  growth  of  a certain 
red  algae  was  chemically  analyzed.  This  red  algae  was  said  to  be  indicative  of 
the  likelihood  that  the  sampled  lake  was  polluted  at  the  time  of  the  sample.  Sub- 
jects were  told  that,  on  the  average,  polluted  lakes  contained  38  parts  per  million 
red  algae  growth,  while  nonpolluted  lakes  averaged  32  parts  per  million.  The 
standard  deviations  were  2.0  and  4.0  to  produce  the  two  levels  of  d'. 

The  original  range  of  likelihood  ratios  was  produced  as  in  Experiment 
I and  again  they  ranged  from  2:1  to  12,000:1  with  the  same  intermediate 
values  as  in  Experiment  I.  The  other  range  of  likelihood  ratios,  termed 
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normal  range  likelihood  ratios,  was  selected  by  a computer  utility  program 
for  the  generation  of  normal  deviates  that  produced  a series  of  25  deviates 
from  a normal  population  with  a mean  of  zero  and  standard  deviation  of  1.0. 
These  deviates  were  then  converted  to  the  population  parameters  defined  in 
the  study  and  the  likelihood  ratios  were  calculated.  The  resultant  likeli- 
hood ratio  ranges  varied  from  1.13:1  to  55.8:1  with  d'=1.5,  and  from 
1.62:1  to  28,566:1  with  d'=3.0. 

Written  instructions  explained  the  nature  of  the  task  and  the  experimen- 
tal stimuli.  Subjects  were  instructed  to  circle  the  more  likely  hypothesis 
and  express  certainty  on  the  scale  provided  as  a likelihood  ratio  between 
the  two  competing  hypotheses.  The  concept  of  likelihood  ratios  was  explained 
in  more  specific  detail  than  in  the  first  experiment.  The  experimenter 
explained  that  the  midpoint  between  the  two  means  should  be  the  cutoff  between 
samples  favoring  either  hypothesis  and  that  the  more  extreme  the  sample  from 
this  midpoint,  the  higher  the  likelihood  ratio  should  be  in  favor  of  the 
hypothesis  on  that  side  of  the  midpoint.  As  in  the  first  experiment,  subjects 
were  told  that  if  the  likelihood  ratio  judgments  were  larger  than  provided 
for  on  the  scale,  they  were  to  mark  the  top  of  the  scale  and  write  their 
numerical  judgment.  Subjects  then  made  fifty  likelihood  ratio  judgments 
for  samples  from  fifty  hypothetical  lakes.  The  order  of  presentation  of 
these  samples  came  in  three  different  random  sequences. 

Subjects  were  run  one  to  four  at  a time  in  self-selected  groups  based 
on  the  time  for  which  they  registered  on  a sign-up  sheet.  Twelve  subjects 
were  run  in  each  of  the  16  cells  of  a completely  crossed  2 x 2 x 2 x 2 
design.  The  factors  in  addition  to  d'  and  the  selection  procedure  for 
stimuli  were  again  spacing  (logarithmic  and  linear)  and  endpoints  (100:1 
and  10,000:1). 


Judgments  were  made  in  booklets  containing  response  scales  similar  to 
those  used  in  Experiment  I.  The  sample  result  from  the  red  algae  test 
appeared  in  the  upper  left  comer  of  the  response  sheet  with  the  words 
"The  designated  lake  contains  Red  Algae  (Soficticus  Grahamae)  tested  at 
(sample  result)  parts  per  million.  It  is  more  likely  to  be  (polluted  or 
not  polluted)  with  a likelihood  of:". 

I II . 2 . Results 

All  data  were  again  transformed  logarithmically  and  all  analyses  were 
performed  on  the  transformed  data.  Likelihood  ratio  responses  were  regressed 
on  the  true  likelihood  ratios  for  each  subject.  Table  4 shows  the  individual 
correlations  from  these  analyses  and  the  mean  correlations,  calculated  using 
the  Fisher- z transformation,  for  each  of  the  16  cells  in  the  design.  Com- 
paring across  all  levels  of  other  factors,  these  correlations  show  no  syste- 
matic differences  between  logarithmically  spaced  scales  and  linearly  spaced 
scales.  Also,  no  systematic  differences  are  apparent  for  scales  with 
endpoints  of  100:1  versus  scales  with  endpoints  of  10,000:1.  Despite  the 
much  more  specific  instructions  and  detailed  explanation  of  the  method  for 
judging  likelihood  ratios,  the  relative  number  of  nonsignificant  (p>.05) 
and  negative  correlations  differs  little  from  Experiment  I (16.2%  in 
Experiment  I and  12.5%  in  Experiment  II),  although  the  difference  is  in  the 
expected  direction.  Subjects  with  nonsignificant  and  negative  correlations 
were  removed  from  all  subsequent  analyses. 

Table  5 shows  the  mean  slopes  and  intercepts  from  the  individual 
regression  analyses.  Again,  as  in  Experiment  I,  the  intercepts  differ 
greatly  between  logarithmically  and  linearly  spaced  scales  with  the  inter- 
cepts of  log  scales  being  closer  to  the  correct  0.0.  This  is  true  regardless 
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Table  4 


Linear 


Correlations  Between  True  Likelihood  Ratios 
and  Responses  for  Individual  Subjects 
(Experiment  II) 


10,000:1 


Normal 


2:1  to  12,000:1  Normal 


.5  3. 


2:1  to  12.000:1 


832 

.84 

812 

.87 

637 

.86 

821 

.71 

756 

.950  (.094) 

.957  (-.772) 
.767  (.151) 

(.012)  (-.898) 


850 
804 
887 
787 
813 
697 
920 
802 

.838  (.135) 
.603  (.236) 
.881  (-.239) 
.816  (-.537) 


.854  .603 
.688  .881 
.701  .827 
.914  .578 
.881  .728 
.821  .757 
.513  .897 
.838  .637 
.806  .836 
.539  .826 
.695  .760 
.660  (.323) 


.674  .84 

.777  .83 


.870  .71 

.691  .79 

.961  .51 

.774  .75 

.707  .87 

.760  .83 

.879  .439 

.869  .439 

.605  (.227) 

(.290)  (.358) 


Averages  .885 


Logarithmic 


10,000:1 


E=  RANGE=  RANGE 

Normal  2:1  to  12,000:1  Normal  2:1  to  12,000:1 


.5  3.0 


941  .740 

848  .783 

914  .398 

946  .523 

986  .927 

867  .936 

846  .782 

495  .930 

735  .934 

.720  .881  .663  .673 


486 

.97 

774 

.83i 

585 

.90 

807 

.86 

420 

.80 

827 

.87' 

817 

.91 

958 

.72 

619 

.55! 

.402  (.145)  .674  .828 

(-.635)  (.319) (-.612)  (.320) 


.922  .844  .824  (.031) 

(-.795)  .480  .789  (-.540) 

(.290)  .680  (.160)  (.097) 


Averages  .861 


Note:  Nonsignificant  correlations  are  removed  from  averages. 
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Table  5 


Average  Slopes  and  Intercepts  of  Responses 
Versus  True  Likelihood  Ratios 
(Experiment  II) 


Linear 

Logarithmic 

100:1  10,000:1 

100:1  10,000:1 

RANGE=Normal 

d'=1.5 

b=  .851 
a=  .576 

b=1.000 

a=2.624 

b= 

a= 

.674 

.490 

b=1.503 
b=  .625 

d'=3.0 

b=  .270 
a=1.048 

b=  .266 
a=2.720 

b* 

a= 

.265 

.619 

b=  .363 
a= 1.903 

RANGE=2:1  to 
12,000:1 

d'=l. 5 

b=  .227 
a-  1.100 

b=  .369 
a=2.466 

b= 

a= 

.389 

.547 

b=  .679 
a*1.037 

d'=3.0 

b=  .348 
a=  .887 

b=  .251 
a* 2. 976 

b= 

a= 

.249 

.881 

b=  .542 
a=  .889 

Note:  Subjects  with  nonsignificant  correlations  between  true  likelihood 
ratios  and  response  likelihood  ratios  are  not  represented  in  the 
calculations  in  this  table.  Slopes  are  represented  by  b,  intercepts 
a. 
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of  the  range  of  the  true  likelihood  ratios  or  the  d'  condition  in  which  the 
subject  responded.  The  slopes  of  responses  on  logarithmically  spaced  scales 
and  linearily  spaced  scales  also  differ  with  the  average  slope  for  logarith- 
mically spaced  scales  closer  to  the  optimal  value  of  1.0.  Endpoints  also 
affected  slopes:  scales  with  an  upper  endpoint  of  10,000:1  have  an  average 
slope  closer  to  1.0. 

Medians  were  calculated  across  subjects  for  response  mode  groups  at 
each  level  of  likelihood  ratio  and  these  medians  were  regressed  on  true 
likelihood  ratios.  These  correlations,  slopes  and  intercepts  are  broken  down 
by  factors  in  Table  6.  Logarithmic  scales  seem  to  be  superior  to  linear 
scales  as  evidenced  by  higher  correlations,  slopes  closer  to  1.0  and  inter- 
cepts closer  to  zero,  but  these  criteria  may  not  completely  reflect  the 
accuracy  of  the  judgments.  A question  arises  in  the  evaluation  of  the  re- 
gression analysis  in  the  case  where  either  the  slope  is  less  than  1.0  and 
the  intercept  greater  than  0.0,  or  the  slope  is  greater  than  1.0  and  the 
intercept  is  less  than  0.0.  In  either  case,  the  subject  may  be  making 
responses  in  the  correct  range  of  true  values,  but  the  deviation  might 
reflect  some  specific  bias  such  as  avoidance  of  high  and  low  range  responses. 

Scales  with  upper  endpoints  of  10,000:1  had  a somewhat  higher  correlation 
between  response  likelihood  ratios  and  the  true  likelihood  ratios,  but  the 
superiority  of  the  slope  of  scales  with  either  endpoint  was  not  definitive 
in  the  light  of  the  extremely  high  intercepts  for  those  scales.  Subjects 
could  well  be  radical  in  their  judgments  when  using  the  higher  endpoint, 
despite  the  slope  being  less  than  1.0. 

To  investigate  this  possibility,  an  analysis  of  variance  was  done  on 
difference  scores  calculated  as  in  Experiment  I.  Table  7 shows  the  means 
for  this  AN0VA.  Significant  differences  were  found  for  both  endpoints  and 
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Table  6 

I 

Correlations,  Slopes,  and  Intercepts 
Median  Responses  versus  True  Likelihood  Ratios 
(Experiment  II) 


Linear 

Logarithmic 

100:1  10,000:1 

100:1  10,000:1 

RANGE=Normal 

d'=1.5 

r=  .898 
b=l . 009 
a=  .537 

r=  .846 
b=  .770 
a=2.950 

r=  .891 
b=  .749 
a=  .419 

r=  .962 
b= 1.404 
a=  .479 

d'=3.0 

r=  .842 
b=  .207 
a=1.205 

r=  .877 
b=  .291 
a=2.836 

r=  .940 
b=  .345 
a=  .522 

r=  .789 
b=  .432 
a=1.995 

RANGE=2 : 1 to 
12,000:1 

d'=1.5 

r*  .857 
b=  .201 
a=1.236 

r=  .836 
b=  .334 
a=2 , 860 

r=  .893 
b=  .375 
a=  .541 

r=  .950 
b=  .646 
a=  .938 

d'=3.0 

r=  .886 
b=  .275 
a=l. 114 

r=  .848 
b=  .217 
a=3.206 

r=  .784 
b=  .274 
a=  .945 

r=  .945 
b=  .522 
a=1.018 

Note:  Subjects  with  nonsignificant  correlations  between  true  likelihood 
ratios  and  response  likelihood  ratios  are  not  represented  in  the 
calculations  in  this  table.  Correlations  are  represented  by  r, 
slopes  by  b,  and  intercepts  by  a. 
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Table  7 


Mean  Absolute  Deviations  Between  Log  Responses 
and  Log  True  Likelihood  Ratios 
(Experiment  II) 


Linear  Logarithmic 


100:1  10,000:1  100:1  10,000:1  Marginal  Means 


RANGE=Normal 

d'=1.5 

.618 

2.628 

.385 

1.083 

1.179 

d'=3.0 

.925 

1.171 

1.236 

1.077 

1.102 

RANGE=2 : 1 to 

d'=1.5 

.836 

1.340 

.945 

.830 

.988 

12,000:1 

d'=3.0 

.743 

1.369 

.922 

.624 

.915 

Marginal  Means 

.781 

1.627 

.872 

.904 

I 

Note:  Subjects  with  nonsignificant  correlations  between  Log  Response 
and  Log  True  are  not  included  in  this  table. 
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spacing  (pc.OOl).  Logarithmic  scales  and  scales  with  endpoints  of  100:1 
result  in  responses  which  are  significantly  closer  to  true.  No  significant 
difference  was  found  for  subjects  under  differing  d'  conditions,  but 
subjects'  assessments  were  more  veridical  when  responding  to  a normal  range 
of  true  likelihood  ratios  than  when  the  likelihood  ratios  were  arbitrarily 
chosen  to  cover  the  range  from  2:1  to  12,000:1. 

Several  interactions  were  significant  but  the  magnitude  of  the  effects 
was  generally  minimal  except  for  the  endpoint  by  spacing  interaction  which 
accounted  for  10.3%  of  the  variance.  Other  factors  which  accounted  for 
appreciable  amounts  of  the  variance  were  endpoint  (13.1%),  spacing  (7.6%) 
and  the  d'  by  endpoint  interaction  (7.8%).  The  magnitude  of  these  effects 
may  be  contrasted  with  the  main  effect  of  the  range  which,  although  signi- 
ficant (p<.001),  accounted  for  only  2.7%  of  the  variance. 


I I I. 3.  Discussion 

Response-mode-produced  biases  in  subjects'  likelihood  ratio  judgments 
appear  to  be  pervasive.  The  amount  and  specific  dimensions  of  the  biases 
are  primarily  dependent  upon  the  characteristics  of  the  response  mode  as 
well  as  the  exact  nature  of  the  task  and  data  generator.  Logarithmically 
spaced  scales  generally  seem  to  result  in  responses  being  significantly 
closer  to  the  true  response.  This  may  be  because  logarithmic  scales  facili- 
tate the  use  of  responses  near  1:1.  Or,  subjects  may  use  (probably  uncon- 
sciously) the  fact  that  distances  on  logarithmic  scales  should  be  linearly 
related  to  the  value  of  the  random  variable  serving  as  the  stimulus.  This 
follows  from  the  true  likelihood  being  an  exponential  function  of  the  random 
variable. 
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Differences  in  responses  resulting  from  upper  endpoints  of  100:1  and 
10,000:1  reflect  a general  tendency  for  subjects  to  maintain  a larger  mag- 
nitude of  response  when  a larger  upper  endpoint  is  used.  The  upper  endpoint 
may  serve  as  a ceiling  for  responses,  for  example,  producing  judgments  on  the 
100:1  scales  which  would  never  exceed  that  upper  bound.  Such  a simple 
explanation  cannot,  however,  explain  why  responses  on  scales  with  100:1  end- 
points are  more  accurate  than  responses  on  scales  with  10,000:1  endpoints, 
even  with  d'=3.0  and/or  true  likelihood  ratios  ranging  from  2:1  to  12,000:1. 
In  these  conditions,  the  relatively  large  number  of  true  likelihood  ratios 
larger  than  100:1  would  suggest  that  scales  with  endpoints  of  10,000:1 
should  lead  to  more  accurate  responses. 

On  the  other  hand,  larger  upper  endpoints  could  be  perceived  by  the 
subjects  as  conveying  information  as  to  the  range  of  likely  values  in  which 
their  judgments  should  fall.  Larger  endpoints  may  suggest  generally  larger 
likelihood  ratios,  thus  leading  to  considerable  overestimation  of  small  and 
middle  range  true  likelihood  ratios.  The  larger  intercepts  of  responses 
on  scales  with  10,000:1  endpoints  exemplifies  this  possibility. 

As  in  the  first  experiment,  findings  are  consistent  with  the  results 
of  Domas  et  al.  (1972)  in  that  slopes  of  the  regression  lines  comparing 
response  likelihood  ratios  with  true  likelihood  ratios  are  larger  for  scales 
with  logarithmic  spacing.  Still,  despite  the  addition  of  a less  extreme 
d'  in  Experiment  II,  slopes  remain  less  than  1.0  in  most  cases,  as  opposed 
to  the  Domas  et  al.  study  where  slopes  were  generally  greater  than  1.0. 

Still,  d'  cannot  be  ruled  out  completely  as  a contributing  factor  since 
Domas  et  al.  used  levels  of  d',  .46  to  1.14,  which  reflected  relatively 
undiagnostic  data. 


25 


In  summary,  response  scales  have  been  shown  to  be  a consistent  factor 
when  subjects  are  making  likelihood  ratio  judgments.  Although  logically 
irrelevant  to  the  judgments  being  made,  both  the  magnitude  of  likelihood 
ratios  presented  on  the  scale  and  the  spacing  of  those  ratios  contribute 
to  systematic  biases  in  the  subjects'  responses.  The  results  of  Experiment 
II  substantiated  the  findings  of  Experiment  I as  to  the  effects  of  endpoint 
and  spacing  of  response  scales.  Experiment  II  went  further  to  show  that 
these  results  could  not  be  attributed  to  either  the  effect  of  the  extreme 
d'  or  the  extreme  nature  of  the  true  likelihood  ratios  in  Experiment  I. 
Generally,  subjects  were  better  able  to  estimate  the  likelihood  ratios  when 
they  were  responding  on  logarithmically  spaced  scales.  Further,  subjects' 
performance  was  somewhat  improved  when  the  upper  endpoint  was  less  than  the 
highest  one  presented  in  these  two  studies  (10,000:1). 

When  the  types  of  judgments  involved  in  these  studies  are  necessary 
inputs  to  decision  making,  the  biases  encountered  here  should  be  taken  into 
account  when  deciding  how  the  judgments  are  to  be  elicited.  The  results 
of  these  two  studies  show  that  consideration  should  be  given  to  the  diag- 
nosticity  of  the  data  with  which  the  person  making  the  judgment  will  be 
dealing  as  well  as  the  range  of  the  true  likelihood  ratios  he  or  she  is 
likely  to  encounter. 
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